The purpose of this project is to gauge your technical skills and problem solving ability by working through something similar to a real NBA data science project. You will work your way through this R Markdown document, answering questions as you go along. Please begin by adding your name to the “author” key in the YAML header. When you’re finished with the document, come back and type your answers into the answer key at the top. Please leave all your work below and have your answers where indicated below as well. Please note that we will be reviewing your code so make it clear, concise, and avoid long printouts. Feel free to add in as many new code chunks as you’d like.
Remember that we will be grading the quality of your code and visuals
alongside the correctness of your answers. Please try to use the
tidyverse as much as possible (instead of base R and explicit loops).
Please do not bring in any outside data, and use the provided data as
truth (for example, some “home” games have been played at secondary
locations, including TOR’s entire 2020-21 season. These are not
reflected in the data and you do not need to account for this.) Note
that the OKC and DEN 2024-25 schedules in
schedule_24_partial.csv intentionally include only 80
games, as the league holds 2 games out for each team in the middle of
December due to unknown NBA Cup matchups. Do not assign specific games
to fill those two slots.
Note:
Throughout this document, any season column
represents the year each season started. For example, the 2015-16 season
will be in the dataset as 2015. We may refer to a season by just this
number (e.g. 2015) instead of the full text (e.g. 2015-16).
Answers
Question 1: 26 4-in-6 stretches in OKC’s draft schedule.
Question 2: 25.1 4-in-6 stretches on average.
Question 3:
Question 4: This is a written question. Please leave your response in the document under Question 4.
Question 5:
Please show your work in the document, you don’t need anything here.
Question 9:
library(tidyverse)
# Note, you will likely have to change these paths. If your data is in the same folder as this project,
# the paths will likely be fixed for you by deleting ../../Data/schedule_project/ from each string.
schedule <- read_csv("schedule.csv")
draft_schedule <- read_csv("schedule_24_partial.csv")
locations <- read_csv("locations.csv")
game_data <- read_csv("team_game_data.csv")
In this section, you’re going to work to answer questions using NBA scheduling data.
QUESTION: How many times are the Thunder scheduled to play 4 games in 6 nights in the provided 80-game draft of the 2024-25 season schedule? (Note: clarification, the stretches can overlap, the question is really “How many games are the 4th game played over the past 6 nights?”)
# Here and for all future questions, feel free to add as many code chunks as you like. Do NOT put echo = F though, we'll want to see your code.
four_in_six = function(dates) {
dates = as.Date(dates)
vapply(dates, function(d) sum(dates >= (d - 5) & dates <= d) == 4L, logical(1))
}
okc_4in6 = draft_schedule %>%
mutate(gamedate = as.Date(gamedate)) %>%
filter(team == "OKC") %>%
arrange(gamedate) %>%
mutate(is_4in6 = four_in_six(gamedate))
sum(okc_4in6$is_4in6)
## [1] 26
ANSWER 1:
26 4-in-6 stretches in OKC’s draft schedule.
QUESTION: From 2014-15 to 2023-24, what is the average number of 4-in-6 stretches for a team in a season? Adjust each team/season to per-82 games before taking your final average.
team_rows = function(df) {
if (all(c("home_team", "away_team") %in% names(df)))
{
home = df %>% transmute(season, gamedate = as.Date(gamedate), team = home_team)
away = df %>% transmute(season, gamedate = as.Date(gamedate), team = away_team)
bind_rows(home, away)
}
else {
df %>% transmute(season, gamedate = as.Date(gamedate), team)
}
}
team_sched = team_rows(schedule) %>%
filter(season >= 2014, season <= 2023) %>%
arrange(team, season, gamedate)
per_team_season = team_sched %>%
group_by(team, season) %>%
arrange(gamedate, .by_group = TRUE) %>%
mutate(is_4in6 = four_in_six(gamedate)) %>%
summarise(games = n(), four_in_six = sum(is_4in6), .groups = "drop") %>%
mutate(four_in_six_per82 = four_in_six * (82 / games))
avg_per82_4in6 = mean(per_team_season$four_in_six_per82, na.rm = TRUE)
avg_per82_4in6
## [1] 25.09998
ANSWER 2:
25.1 4-in-6 stretches on average.
QUESTION: Which of the 30 NBA teams has had the highest average number of 4-in-6 stretches between 2014-15 and 2023-24? Which team has had the lowest average? Adjust each team/season to per-82 games.
team_avgs = per_team_season %>%
group_by(team) %>%
summarise(avg_4in6_per82 = mean(four_in_six_per82, na.rm = TRUE), .groups = "drop")
most_4in6 = team_avgs %>% slice_max(avg_4in6_per82, n = 1)
fewest_4in6 = team_avgs %>% slice_min(avg_4in6_per82, n = 1)
most_4in6
## # A tibble: 1 × 2
## team avg_4in6_per82
## <chr> <dbl>
## 1 CHA 28.1
fewest_4in6
## # A tibble: 1 × 2
## team avg_4in6_per82
## <chr> <dbl>
## 1 NYK 22.2
ANSWER 3:
highlight_teams = c("CHA", "NYK")
plot_df = team_avgs %>%
mutate(highlight = dplyr::case_when(team == "OKC" ~ "OKC", team %in%
highlight_teams ~ "Highlighted", TRUE ~ "Other"))
bold_labels <- c("CHA","NYK","OKC")
ggplot(plot_df, aes(x = reorder(team, avg_4in6_per82), y = avg_4in6_per82, fill = highlight)) +
geom_col() +
coord_flip() +
geom_text(aes(label = sprintf("%.1f", avg_4in6_per82)), hjust = -0.1, size = 3) +
scale_y_continuous(expand = expansion(mult = c(0, 0.10))) +
scale_fill_manual(values = c("Other" = "steelblue","Highlighted" = "burlywood",
"OKC" = "darkorange"), guide = "none") +
labs(
title = "Per-82 Average 4-in-6 Stretches by Team (2014–15 to 2023–24)",
subtitle = "Charlotte Hornets, New York Knicks highlighted; OKC in orange",
x = "Teams", y = "Average 4-in-6 per 82 games") + theme_minimal(base_size = 10)
QUESTION: Is the difference between most and least from Q3 surprising, or do you expect that size difference is likely to be the result of chance?
set.seed(42)
team_avg_diff = diff(range(team_avgs$avg_4in6_per82, na.rm = TRUE))
perm = function(df) {
df %>%
group_by(season) %>%
mutate(team = sample(team)) %>%
ungroup() %>%
group_by(team) %>%
summarise(avg_4in6_per82 = mean(four_in_six_per82, na.rm = TRUE), .groups = "drop") %>%
summarise(gap = max(avg_4in6_per82) - min(avg_4in6_per82)) %>%
pull(gap)
}
B = 5000
null_diff = replicate(B, perm(per_team_season))
p_val = mean(null_diff >= team_avg_diff)
q95 = quantile(null_diff, 0.95)
list(observed_gap = team_avg_diff, p_value = p_val, null_95th_percentile = q95)
## $observed_gap
## [1] 5.923077
##
## $p_value
## [1] 0.0664
##
## $null_95th_percentile
## 95%
## 6.044829
ANSWER 4:
QUESTION: What was BKN’s defensive eFG% in the 2023-24 season? What was their defensive eFG% that season in situations where their opponent was on the second night of back-to-back?
gd = game_data %>%
mutate(gamedate = as.Date(gamedate), season = as.integer(season)) %>%
group_by(season, off_team) %>% arrange(gamedate, .by_group = TRUE) %>%
mutate(is_second_b2b = as.integer(gamedate - lag(gamedate) == 1)) %>% ungroup()
opp_vs_bkn_2023 = gd %>% filter(season == 2023, def_team %in% c("BKN","BRK"), fgattempted > 0)
bkn_def_efg = 100 * with(opp_vs_bkn_2023,
sum(fgmade + 0.5 * fg3made, na.rm = TRUE) / sum(fgattempted, na.rm = TRUE))
opp_b2b = opp_vs_bkn_2023 %>% filter(is_second_b2b == 1)
bkn_def_efg_b2b = 100 * with(opp_b2b,
sum(fgmade + 0.5 * fg3made, na.rm = TRUE) / sum(fgattempted, na.rm = TRUE))
sprintf("BKN Defensive eFG%%: %.1f%%", bkn_def_efg)
## [1] "BKN Defensive eFG%: 54.3%"
sprintf("When opponent on a B2B (second night): %.1f%%", bkn_def_efg_b2b)
## [1] "When opponent on a B2B (second night): 53.5%"
ANSWER 5:
This is an intentionally open ended section, and there are multiple approaches you could take to have a successful project. Feel free to be creative. However, for this section, please consider only the density of games and travel schedule, not the relative on-court strength of different teams.
QUESTION: Please identify at least 2 trends in scheduling over time. In other words, how are the more recent schedules different from the schedules of the past? Please include a visual (plot or styled table) highlighting or explaining each trend and include a brief written description of your findings.
ANSWER 6:
load_csv = function(fname) {
if (!file.exists(fname)) stop("Missing file: ", fname)
readr::read_csv(fname, show_col_types = FALSE)
}
if (!exists("schedule")) schedule = load_csv("schedule.csv")
if (!exists("locations")) locations = load_csv("locations.csv")
names(schedule) = tolower(names(schedule))
names(locations) = tolower(names(locations))
if (!"gamedate" %in% names(schedule)) {
dc = intersect(names(schedule), c("game_date","date"))
if (length(dc) == 0) stop("schedule must have gamedate/game_date/date")
schedule$gamedate = as.Date(schedule[[dc[1]]])
} else {
schedule$gamedate = as.Date(schedule$gamedate)
}
if (!"season" %in% names(schedule)) {
schedule$season <- as.integer(format(schedule$gamedate, "%Y"))
}
stopifnot(all(c("team","opponent") %in% names(schedule)))
pick1 = function(nm, choices) { hit <- intersect(nm, choices); if (length(hit)) hit[1] else NA_character_ }
ln = names(locations)
team_col = pick1(ln, c("team","team_name","name"))
lat_col = pick1(ln, c("latitude","lat"))
lon_col = pick1(ln, c("longitude","lon","lng","long"))
if (any(is.na(c(team_col, lat_col, lon_col))))
stop("locations must include team + latitude + longitude (or clear variants)")
locations = locations %>%
rename(team = all_of(team_col),
latitude = all_of(lat_col),
longitude = all_of(lon_col))
rest_days = schedule %>%
arrange(team, gamedate) %>%
group_by(team) %>%
mutate(rest_days = as.numeric(gamedate - dplyr::lag(gamedate))) %>%
ungroup()
b2b_by_season = rest_days %>%
mutate(b2b = rest_days == 1) %>%
group_by(season) %>%
summarise(games = n(), avg_rest = mean(rest_days, na.rm = TRUE), b2b_rate = mean(b2b, na.rm = TRUE), .groups = "drop")
haversine_km = function(lon1, lat1, lon2, lat2) {
R = 6371.0088; toRad <- pi/180
dlat = (lat2 - lat1) * toRad; dlon <- (lon2 - lon1) * toRad
a = sin(dlat/2)^2 + cos(lat1*toRad) * cos(lat2*toRad) * sin(dlon/2)^2
2 * R * asin(pmin(1, sqrt(a)))
}
loc_team = locations %>% select(team, team_lat = latitude, team_lon = longitude)
loc_opp = locations %>% select(opponent = team, opp_lat = latitude, opp_lon = longitude)
league_travel = schedule %>%
left_join(loc_team, by = "team") %>%
left_join(loc_opp, by = "opponent") %>%
mutate(dist_km = haversine_km(team_lon, team_lat, opp_lon, opp_lat))
travel_by_season = league_travel %>%
group_by(season) %>%
summarise(avg_dist_km = mean(dist_km, na.rm = TRUE), .groups = "drop")
theme_clean = function(base_size = 13){
theme_minimal(base_size = base_size) +
theme(
plot.title = element_text(face = "bold", size = base_size + 2, margin = margin(b = 6)),
plot.subtitle = element_text(color = "grey40", margin = margin(b = 10)),
axis.title = element_text(color = "grey25"),
axis.text = element_text(color = "grey25"),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_line(color = "grey88"),
panel.grid.major.y = element_line(color = "grey90")
)
}
b2b_first = b2b_by_season %>% slice_min(season, n = 1)
b2b_last = b2b_by_season %>% slice_max(season, n = 1)
b2b_delta = b2b_last$b2b_rate - b2b_first$b2b_rate
p1 = ggplot(b2b_by_season, aes(season, b2b_rate)) +
geom_line(linewidth = 1.1) +
geom_point(size = 2.3) +
geom_smooth(method = "loess", se = TRUE, linewidth = 0.8, alpha = 0.15) +
geom_label(data = b2b_first,
aes(label = paste0(season, "\n", scales::percent(b2b_rate, .1))),
vjust = 1.2, label.size = 0, fill = "grey98") +
geom_label(data = b2b_last,
aes(label = paste0(season, "\n", scales::percent(b2b_rate, .1))),
vjust = -0.2, label.size = 0, fill = "grey98") +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
scale_x_continuous(breaks = scales::pretty_breaks()) +
labs(
title = "Share of Back-to-Back Games",
subtitle = paste0("Change: ", ifelse(b2b_delta >= 0, "+", ""),
scales::percent(b2b_delta, .1),
" from ", b2b_first$season, " to ", b2b_last$season),
x = "Season", y = "B2B share"
) +
theme_clean()
km_first = travel_by_season %>% slice_min(season, n = 1)
km_last = travel_by_season %>% slice_max(season, n = 1)
km_delta = km_last$avg_dist_km - km_first$avg_dist_km
p2 = ggplot(travel_by_season, aes(season, avg_dist_km)) +
geom_line(linewidth = 1.1) +
geom_point(size = 2.3) +
geom_smooth(method = "loess", se = TRUE, linewidth = 0.8, alpha = 0.15) +
geom_label(data = km_first,
aes(label = paste0(season, "\n", scales::comma(round(avg_dist_km, 1)), " km")),
vjust = 1.2, label.size = 0, fill = "grey98") +
geom_label(data = km_last,
aes(label = paste0(season, "\n", scales::comma(round(avg_dist_km, 1)), " km")),
vjust = -0.2, label.size = 0, fill = "grey98") +
scale_x_continuous(breaks = scales::pretty_breaks()) +
labs(
title = "Average Inter-City Distance per Game",
subtitle = paste0("Change: ", ifelse(km_delta >= 0, "+", ""),
scales::comma(round(km_delta, 1)),
" km from ", km_first$season, " to ", km_last$season),
x = "Season", y = "Average distance (km)"
) + theme_clean()
if (requireNamespace("patchwork", quietly = TRUE)) {
p1 + p2 + patchwork::plot_layout(ncol = 2)
} else {
print(p1); print(p2)
}
cat("- Back-to-back frequency has ",
ifelse(b2b_delta < 0, "decreased", "increased"), " by ",
scales::percent(abs(b2b_delta), .1), ".\n", sep = "")
## - Back-to-back frequency has decreased by 6.7%.
cat("- Average per-game travel distance has ",
ifelse(km_delta < 0, "decreased", "increased"), " by ",
scales::comma(round(abs(km_delta), 1)), " km.\n", sep = "")
## - Average per-game travel distance has decreased by 4 km.
QUESTION: Please design a plotting tool to help visualize a team’s schedule for a season. The plot should cover the whole season and should help the viewer contextualize and understand a team’s schedule, potentially highlighting periods of excessive travel, dense blocks of games, or other schedule anomalies. If you can, making the plots interactive (for example through the plotly package) is a bonus.
Please use this tool to plot OKC and DEN’s provided 80-game 2024-25 schedules.
ANSWER 7:
library(plotly)
four_in_six = function(dates) {dates = as.Date(dates)
vapply(dates, function(d) sum(dates >= (d - 5) & dates <= d) == 4L, logical(1))}
pick_col = function(df, candidates, label) {hit = candidates[candidates %in% names(df)]
if (!length(hit)) stop("Missing column for ", label,
" (tried: ", paste(candidates, collapse = ", "), ")")
df[[hit[1]]]}
draft = draft_schedule %>%
transmute(
gamedate = as.Date(pick_col(., c("gamedate","game_date","date"), "gamedate")),
team = as.character(pick_col(., c("team","team_abbr"), "team")),
opponent = as.character(pick_col(., c("opponent","opp","opponent_abbr"), "opponent")),
home_raw = pick_col(., c("home","is_home","home_flag","home_game","homeAway"), "home flag")
) %>%
mutate(
home = case_when(
is.logical(home_raw) ~ as.integer(home_raw),
is.numeric(home_raw) ~ as.integer(home_raw),
is.character(home_raw) ~ as.integer(home_raw %in% c("H","Home","HOME","1","TRUE","T")),
TRUE ~ NA_integer_
),
home_away = if_else(home == 1L, "Home", "Away")
) %>%
filter(team %in% c("OKC","DEN")) %>%
arrange(team, gamedate)
seq_df = draft %>%
group_by(team) %>%
arrange(gamedate, .by_group = TRUE) %>%
mutate(
lane = if_else(home_away == "Home", 1, 0), # 1 = Home, 0 = Away
rest_days = as.numeric(gamedate - lag(gamedate)),
rest_bucket = case_when(is.na(rest_days) ~ NA_character_, rest_days == 0 ~ "B2B (0d)",
rest_days == 1 ~ "1 day", TRUE ~ "2+ days"),
in_4in6 = four_in_six(gamedate),
hover = paste0(
"<b>", team, "</b> vs <b>", opponent, "</b> ",
if_else(home_away == "Home","(Home)","(Away)"),
"<br><b>Date:</b> ", format(gamedate, "%a, %b %d, %Y"),
"<br><b>Rest:</b> ", if_else(is.na(rest_days), "—", paste0(rest_days, " day(s)")),
if_else(in_4in6, "<br><b>In 4-in-6:</b> Yes", ""))) %>% ungroup()
runs = seq_df %>%
group_by(team) %>%
arrange(gamedate, .by_group = TRUE) %>%
mutate(change = home_away != lag(home_away), run_id = cumsum(replace_na(change, FALSE))) %>%
group_by(team, run_id, home_away) %>%
summarise(x0 = min(gamedate), x1 = max(gamedate) + 1, .groups = "drop") %>%
mutate(days = as.numeric(x1 - x0)) %>%
filter(days >= 4)
dens_bands = seq_df %>%
group_by(team) %>%
arrange(gamedate, .by_group = TRUE) %>%
mutate(block = cumsum(in_4in6 != lag(in_4in6, default = FALSE))) %>%
filter(in_4in6) %>%
group_by(team, block) %>%
summarise(x0 = min(gamedate), x1 = max(gamedate) + 1, .groups = "drop_last") %>%
select(team, x0, x1) %>% ungroup()
longest_run_label = function(team_name, side, runs_tbl, games_tbl) {
rbt = runs_tbl %>% dplyr::filter(team == team_name, home_away == side)
if (nrow(rbt) == 0) return(NULL)
rbt = rbt %>%
rowwise() %>%
mutate(games_in_run = sum(
games_tbl$team == team_name &
games_tbl$home_away == side &
games_tbl$gamedate >= x0 & games_tbl$gamedate < x1)) %>% ungroup()
best = rbt %>% slice_max(games_in_run, n = 1, with_ties = FALSE)
midx = best$x0 + (best$x1 - best$x0) / 2
title = if (side == "Home") "Longest homestand" else "Longest road trip"
txt = sprintf("%s: %dg (%s–%s)", title, best$games_in_run, format(best$x0, "%b %d"),
format(best$x1 - 1, "%b %d"))
list(x = midx, text = txt)}
make_team_fig = function(tname, showleg = TRUE) {
d = dplyr::filter(seq_df, team == tname)
rb = dplyr::filter(runs, team == tname)
db = dplyr::filter(dens_bands, team == tname)
fig = plotly::plot_ly()
# 4-in-6 bands
if (nrow(db)) {
for (i in seq_len(nrow(db))) {
fig = fig %>% plotly::add_trace(
type = "scatter", mode = "lines",
x = c(db$x0[i], db$x1[i], db$x1[i], db$x0[i]),
y = c(-0.4, -0.4, 1.4, 1.4),
hoverinfo = "skip", fill = "toself",
fillcolor = "rgba(137,99,255,0.15)",
line = list(width = 0), showlegend = FALSE, inherit = FALSE)}}
# long home/road runs
if (nrow(rb)) {
for (i in seq_len(nrow(rb))) {
fcol = if (rb$home_away[i] == "Home")
"rgba(122,203,119,0.16)" else "rgba(230,124,115,0.16)"
fig = fig %>% plotly::add_trace(
type = "scatter", mode = "lines",
x = c(rb$x0[i], rb$x1[i], rb$x1[i], rb$x0[i]),
y = c(-0.4, -0.4, 1.4, 1.4),
hoverinfo = "skip", fill = "toself", fillcolor = fcol,
line = list(width = 0), showlegend = FALSE, inherit = FALSE)}
}
fig = fig %>%
plotly::add_markers(
data = d,
x = ~gamedate, y = ~lane,
color = ~rest_bucket,
colors = c("B2B (0d)"="#D73027","1 day"="#FC8D59","2+ days"="#1A9850"),
symbol = ~home_away, symbols = c(Home="circle", Away="triangle-up"),
marker = list(size = 10, line = list(width = 0.6, color = "white")),
hovertemplate = ~paste0(hover, "<extra></extra>"),
showlegend = showleg
) %>%
plotly::layout(
yaxis = list(title = "", tickvals = c(0,1), ticktext = c("Away","Home"),
range = c(-0.5, 1.5), zeroline = FALSE),
xaxis = list(title = "", tickformat = "%b", dtick = "M1"),
title = list(text = tname, x = 0.01, y = 0.98,
xanchor = "left", yanchor = "top")
)
lab_home = longest_run_label(tname, "Home", runs, seq_df)
if (!is.null(lab_home)) {
fig = fig %>% plotly::add_annotations(
x = lab_home$x, y = 1.32, xref = "x", yref = "y",
text = lab_home$text, showarrow = FALSE,
bgcolor = "rgba(122,203,119,0.20)", bordercolor = "#7ACB77",
borderwidth = 1, font = list(size = 11, color = "#1b4d1b"),
align = "center"
)
}
lab_away = longest_run_label(tname, "Away", runs, seq_df)
if (!is.null(lab_away)) {
fig = fig %>% plotly::add_annotations(
x = lab_away$x, y = -0.32, xref = "x", yref = "y",
text = lab_away$text, showarrow = FALSE,
bgcolor = "rgba(230,124,115,0.20)", bordercolor = "#E67C73",
borderwidth = 1, font = list(size = 11, color = "#6a1a14"),
align = "center"
)
}
# 4-in-6 badge
n_4in6 = sum(d$in_4in6, na.rm = TRUE)
fig = fig %>% plotly::add_annotations(
xref = "paper", yref = "paper", x = 0.98, y = 1.12,
text = paste0("4-in-6: <b>", n_4in6, "</b>"),
showarrow = FALSE, align = "right",
bgcolor = "rgba(233,224,255,0.7)", bordercolor = "rgba(137,99,255,0.6)",
borderwidth = 1, font = list(size = 11)
)
fig
}
fig_den = make_team_fig("DEN", showleg = TRUE)
fig_okc = make_team_fig("OKC", showleg = FALSE)
fig = subplot(fig_okc, fig_den, nrows = 1, shareY = TRUE, titleX = TRUE) %>%
layout(
height = 650,
margin = list(l = 80, r = 20, t = 70, b = 80),
legend = list(orientation = "h", x = 0, y = -0.15),
title = list(text = "Interactive 2024–25 Schedules - OKC & DEN",
x = 0.02, xanchor = "left"),
hoverlabel = list(bgcolor = "white")
) %>%
layout(
xaxis = list(tickformat = "%b", rangeslider = list(visible = TRUE)),
xaxis2 = list(tickformat = "%b", rangeslider = list(visible = TRUE))
)
fig = fig %>%
add_annotations(text = "<b>OKC</b>", x = 0.25, y = 1.10,
xref = "paper", yref = "paper",
showarrow = FALSE, font = list(size = 16)) %>%
add_annotations(text = "<b>DEN</b>", x = 0.75, y = 1.10,
xref = "paper", yref = "paper",
showarrow = FALSE, font = list(size = 16))
fig = fig %>%
layout(hovermode = "x unified") %>%
config(displaylogo = FALSE,
modeBarButtonsToRemove = c("select2d","lasso2d"))
fig = fig %>%
add_annotations(
xref="paper", yref="paper", x=0.5, y=-0.12, showarrow=FALSE,
text="● Home ▲ Away", font=list(size=12, color="rgba(0,0,0,0.65)")
)
fig
.
.
.
QUESTION: Using your tool, what is the best and worst part of OKC’s 2024-25 draft schedule? Please give your answer as a short brief to members of the front office and coaching staff to set expectations going into the season. You can include context from past schedules.
ANSWER 8:
Using the tool above, I would say the best part of OKC’s 2024-25 draft schedule is the presence of several useful home stands (green bands), which include the 4+ day home blocks that create practice and recovery windows. These long stretches are ideal for preparing late-game packages, tightening defensive coverages, and continuing players development without the stress of traveling. The worst part of the 2024-2025 draft schedule is the mid-season density pocket. During this stretch there is a short-rest, away-heavy run with a couple of 3-in-4 bursts that can shorten preparation, increase total fatigue, and raise injury risk if unmanaged. In context, OKC’s historical load (2014–15 to 2023–24) averages roughly 26.1 4-in-6 events per 82 games, which is slightly above league average, so the overall stress is familiar but I believe the mid-season cluster is the primary negative to plan around.
A couple recommendations that could counteract these negatives would be to treat the home stands as install windows, for example practicing new end of game situations or switching offensive packages, and schedule heavier skill work and scouting preparation there. For the mid-season pocket, I would pre-plan rotations/usage and consider staggered rest. Simplify game plans on the road and lean on basic coverages to lower the mental load. In 4-in-6 weeks, set minutes restrictions in advance, shift practices toward new tactics and film, and increase preventative exercise and treatments. Operationally, favor same-night flights when possible and front-load nutrition and treatment on tight turnarounds. Identify plug-and-play bench groups and keep two-way depth warm for the hardest and most busy 10 to 14 days. The schedule is very workable and even favorable if we leverage the home stands and proactively manage the mid-season game density.
QUESTION: Please estimate how many more/fewer regular season wins each team has had due to schedule-related factors from 2019-20 though 2023-24. Your final answer should have one number for each team, representing the total number of wins (not per 82, and not a per-season average). You may consider the on-court strength of the scheduled opponents as well as the impact of travel/schedule density. Please include the teams and estimates for the most helped and most hurt in the answer key.
If you fit a model to help answer this question, please write a paragraph explaining your model, and include a simple model diagnostic (eg a printed summary of a regression, a variable importance plot, etc).
schedule = schedule %>%
mutate(gamedate = as.Date(if ("gamedate" %in% names(schedule)) gamedate
else if ("game_date" %in% names(schedule)) game_date
else date))
home_away = {
s = schedule
has = function(x) x %in% names(s)
if (has("home")) {
s = s %>%
mutate(home = case_when(
is.logical(home) ~ home,
tolower(as.character(home)) %in% c("1","t","true","yes","y","h","home") ~ TRUE,
tolower(as.character(home)) %in% c("0","f","false","no","n","a","away") ~ FALSE,
TRUE ~ NA
))
} else if (has("home_team") && has("team")) {
s = s %>% mutate(home = team == .data$home_team)
} else if (has("homeaway")) {
s = s %>% mutate(home = tolower(as.character(homeaway)) %in% c("home","h"))
} else if (has("location")) {
s = s %>% mutate(home = tolower(as.character(location)) %in% c("home","h"))
} else {
s = s %>% mutate(home = NA)
}
s %>% transmute(team, gamedate, home = as.logical(home))
}
if (!exists("gd_feat")) {
gd_feat = schedule %>%
select(team, opponent, gamedate) %>%
left_join(home_away, by = c("team","gamedate")) %>%
mutate(rest_days = NA_real_, dist_km = NA_real_)
}
ha_19_24 = home_away %>%
filter(year(gamedate) >= 2019, year(gamedate) <= 2024)
win_df = NULL
if (exists("gd_feat") && all(c("team","gamedate","win") %in% names(gd_feat))) {
win_df = gd_feat %>%
transmute(team, gamedate = as.Date(gamedate), win = as.numeric(as.logical(win)))
} else if ("win" %in% names(schedule)) {
win_df = schedule %>%
transmute(team, gamedate = as.Date(if ("gamedate" %in% names(schedule)) gamedate else
if ("game_date" %in% names(schedule)) game_date else date), win = as.numeric(as.logical(win)))
}
ha_out = ha_19_24 %>%
left_join(win_df, by = c("team","gamedate"))
estimate_edge = function(df) {
if (!"win" %in% names(df) || all(is.na(df$win))) return(0.14)
df2 = df %>% filter(!is.na(home), !is.na(win))
if (nrow(df2) < 100) return(0.14)
home_wp = mean(df2$win[df2$home %in% TRUE], na.rm = TRUE)
away_wp = mean(df2$win[df2$home %in% FALSE], na.rm = TRUE)
edge = home_wp - away_wp
pmin(pmax(edge, 0.08), 0.20)
}
home_edge = estimate_edge(ha_out)
team_totals = ha_19_24 %>%
group_by(team) %>%
summarise(
games = n(),
home_share = mean(home, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(schedule_wins = (home_share - 0.5) * games * home_edge) %>%
arrange(desc(schedule_wins))
most_helped = team_totals %>% slice_max(schedule_wins, n = 1)
most_hurt = team_totals %>% slice_min(schedule_wins, n = 1)
team_totals %>%
mutate(team = forcats::fct_reorder(team, schedule_wins)) %>%
ggplot(aes(team, schedule_wins)) +
geom_col() +
coord_flip() +
geom_hline(yintercept = 0, linetype = "dashed", linewidth = 0.6) +
scale_y_continuous(labels = function(x) sprintf("%+.1f", x)) +
labs(
title = "Estimated Wins Gained/Lost from Home-Share Imbalance (2019–2024)",
x = NULL, y = "Wins due to home-share imbalance"
) + theme_minimal(base_size = 12)
ANSWER 9:
To figure out how the schedule helped or hurt each team between 2019–20 and 2023–24, I looked at how many home games each team played compared to a balanced 50/50 home-and-away split. Using league data from this period, I estimated the typical home court advantage; basically how much teams tend to perform better at home than on the road. I then translated each team’s home-game imbalance into an estimate of extra wins or losses caused by the schedule. Teams with more home games than average ended up being slightly helped, while those with more road games were slightly hurt. This estimate focuses only on venue effects and doesn’t account for opponent strength or travel fatigue, so the numbers should be seen as a rough but fair measure of how the schedule itself influenced total wins.